智能论文笔记

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

Soumajyoti Sarkar , Kaixiang Lin , Sailik Sengupta , Leonard Lausen , Sheng Zha , Saab Mansour

分类：自然语言处理 | 机器学习

2022-11-08

The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.

translated by 谷歌翻译

CH-MARL: A Multimodal Benchmark for Cooperative, Heterogeneous Multi-Agent Reinforcement Learning

Vasu Sharma , Prasoon Goyal , Kaixiang Lin , Govind Thattai , Qiaozi Gao , Gaurav S. Sukhatme

分类：人工智能 | 计算机视觉 | 机器学习 | 机器人

2022-08-26

我们为合作和异构多机构学习提供了多模式（视觉和语言）基准。我们介绍了一个基准的多模式数据集，其任务涉及在丰富的多房间环境中多个模拟异质机器人之间的协作。我们提供了一个集成的学习框架，最先进的多机构增强学习技术的多模式实现以及一致的评估协议。我们的实验研究了不同方式对多代理学习绩效的影响。我们还引入了代理之间的简单消息传递方法。结果表明，多模式为合作多学院学习带来了独特的挑战，并且在此类环境中推进多机构增强学习方法还有很大的空间。

translated by 谷歌翻译

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Xiaofeng Gao , Qiaozi Gao , Ran Gong , Kaixiang Lin , Govind Thattai , Gaurav S. Sukhatme

分类：人工智能 | 机器人

2022-02-27

语言指导的体现了AI基准，要求代理导航环境并操纵对象通常允许单向通信：人类用户向代理提供了自然语言命令，而代理只能被动地遵循命令。我们介绍了基于Alfred基准测试的基准测试后的拨号式拨号。Dialfred允许代理商积极向人类用户提出问题；代理使用用户响应中的其他信息来更好地完成其任务。我们发布了一个具有53K任务的问题和答案的人类注销数据集，以及一个可以回答问题的甲骨文。为了解决Dialfred，我们提出了一个提问者绩效框架，其中发问者通过人类通知的数据进行了预训练，并通过增强学习进行了微调。我们将拨号拨入公开，并鼓励研究人员提出和评估他们的解决方案，以构建支持对话的体现代理。

translated by 谷歌翻译

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Yizhou Zhao , Kaixiang Lin , Zhiwei Jia , Qiaozi Gao , Govind Thattai , Jesse Thomason , Gaurav S. Sukhatme

分类：人工智能

2021-11-10

基于学习的培训方法的方法通常需要大量包含现实布局的高质量场景并支持有意义的互动。然而，用于体现AI（EAI）挑战的当前模拟器仅提供具有有限数量的布局的模拟室内场景。本文呈现出发光，第一研究框架采用最先进的室内场景综合算法，以在体现AI挑战的情况下生成大规模模拟场景。此外，我们通过支持复杂的家庭任务的能力自动和定量地评估生成的室内场景的质量。发光结合了一种新颖的场景生成算法（受限的随机现场生成（CSSG）），实现了具有人类设计的场景的竞争性能。在发光，EAI任务执行器，任务指令生成模块和视频呈现工具包中可以集体为实现的AI代理商的培训和评估集体为新场景产生大量多模式数据集。广泛的实验结果表明了发光产生的数据的有效性，使对泛化和鲁棒性的体现特性进行全面评估。

translated by 谷歌翻译

Distributed Ranging SLAM for Multiple Robots with Ultra-WideBand and Odometry Measurements

Ran Liu , Zhongyuan Deng , Zhiqiang Cao , Muhammad Shalihan , Billy Pik Lik Lau , Kaixiang Chen , Kaushik Bhowmik , Chau Yuen , U-Xuan Tan

分类：机器人

2022-07-08

为了在多个机器人系统中有效完成任务，必须解决的问题是同时定位和映射（SLAM）。激光雷达（光检测和范围）由于其出色的精度而用于许多SLAM解决方案，但其性能在无特征环境（如隧道或长走廊）中降低。集中式大满贯解决了云服务器的问题，云服务器需要大量的计算资源，并且缺乏针对中央节点故障的鲁棒性。为了解决这些问题，我们提出了一个分布式的SLAM解决方案，以使用超宽带（UWB）范围和探测测量值估算一组机器人的轨迹。所提出的方法在机器人团队之间分配了处理，并显着减轻了从集中式大满贯出现的计算问题。我们的解决方案通过最大程度地减少在机器人处于近距离接近时在不同位置进行的UWB范围测量方法来确定两个机器人之间的相对姿势（也称为环闭合）。 UWB在视线条件下提供了良好的距离度量，但是由于机器人的噪声和不可预测的路径，检索精确的姿势估计仍然是一个挑战。为了处理可疑的循环封闭，我们使用成对的一致性最大化（PCM）来检查循环封闭质量并执行异常拒绝。然后，在分布式姿势图优化（DPGO）模块中将过滤的环闭合与探光仪融合，以恢复机器人团队的完整轨迹。进行了广泛的实验以验证所提出的方法的有效性。

translated by 谷歌翻译

Algorithm Design and Integration for a Robotic Apple Harvesting System

Kaixiang Zhang , Kyle Lammers , Pengyu Chu , Nathan Dickinson , Zhaojian Li , Renfu Lu

分类：机器人

2022-03-01

Due to labor shortage and rising labor cost for the apple industry, there is an urgent need for the development of robotic systems to efficiently and autonomously harvest apples. In this paper, we present a system overview and algorithm design of our recently developed robotic apple harvester prototype. Our robotic system is enabled by the close integration of several core modules, including visual perception, planning, and control. This paper covers the main methods and advancements in deep learning-based multi-view fruit detection and localization, unified picking and dropping planning, and dexterous manipulation control. Indoor and field experiments were conducted to evaluate the performance of the developed system, which achieved an average picking rate of 3.6 seconds per apple. This is a significant improvement over other reported apple harvesting robots with a picking rate in the range of 7-10 seconds per apple. The current prototype shows promising performance towards further development of efficient and automated apple harvesting technology. Finally, limitations of the current system and future work are discussed.

translated by 谷歌翻译

Few-shot learning with improved local representations via bias rectify module

Chao Dong , Qi Ye , Wenchao Meng , Kaixiang Yang

分类：计算机视觉

2021-11-01

基于度量学习的最近方法取得了很大镜头学习的巨大进步。然而，大多数人都仅限于图像级表示方式，这不能正确地处理课外变化和空间知识，从而产生不希望的性能。在本文中，我们提出了一个深度偏置纠正网络（DBRN）来充分利用特征表示结构中存在的空间信息。我们首先采用偏置整流模块来缓解由类内变化引起的不利影响。偏置纠正模块能够专注于通过给定不同权重的对分类更具判别的特征。为了充分利用培训数据，我们设计了一种模拟增强机制，可以使从支架组产生的原型更具代表性。为了验证我们方法的有效性，我们对各种流行的几次分类基准进行了广泛的实验，我们的方法可以优于最先进的方法。

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Liqun Lin , Yang Zheng , Weiling Chen , Chengdong Lan , Tiesong Zhao

分类：计算机视觉

2023-01-03

Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

translated by 谷歌翻译

OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup

Zhijing Yang , Junyang Chen , Yukai Shi , Hao Li , Tianshui Chen , Liang Lin

分类：计算机视觉

2023-01-03

Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.

translated by 谷歌翻译